{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/LindormDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lindorm\n",
    "\n",
    ">[Lindorm](https://www.alibabacloud.com/help/en/lindorm) is a cloud native multi-model database service. It allows you to store data of all sizes. Lindorm supports low-cost storage and processing of large amounts of data and the pay-as-you-go billing method. It is compatible with the open standards of multiple open source software, such as Apache HBase, Apache Cassandra, Apache Phoenix, OpenTSDB, Apache Solr, and SQL.\n",
    "\n",
    "\n",
    "To run this notebook you need a Lindorm instance running in the cloud. You can get one following [this link](https://alibabacloud.com/help/en/lindorm/latest/create-an-instance).\n",
    "\n",
    "After creating the instance, you can get your instance [information](https://www.alibabacloud.com/help/en/lindorm/latest/view-endpoints) and run [curl commands](https://www.alibabacloud.com/help/en/lindorm/latest/connect-and-use-the-search-engine-with-the-curl-command) to connect to and use LindormSearch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to ensure you have `llama-index` installed:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install opensearch-py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-vector-stores-lindorm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# choose dashscope as embedding and llm model, your can also use default openai or other model to test\n",
    "%pip install llama-index-embeddings-dashscope\n",
    "%pip install llama-index-llms-dashscope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "import needed package dependencies:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader\n",
    "from llama_index.vector_stores.lindorm import (\n",
    "    LindormVectorStore,\n",
    "    LindormVectorClient,\n",
    ")\n",
    "from llama_index.core import VectorStoreIndex, StorageContext"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Config dashscope embedding and llm model, your can also use default openai or other model to test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set Embbeding model\n",
    "from llama_index.core import Settings\n",
    "from llama_index.embeddings.dashscope import DashScopeEmbedding\n",
    "\n",
    "# Global Settings\n",
    "Settings.embed_model = DashScopeEmbedding()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# config llm model\n",
    "from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels\n",
    "\n",
    "dashscope_llm = DashScope(model_name=DashScopeGenerationModels.QWEN_MAX)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download example data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-07-10 14:01:02--  https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt\n",
      "正在解析主机 raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...\n",
      "正在连接 raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... 已连接。\n",
      "已发出 HTTP 请求，正在等待回应... 200 OK\n",
      "长度：75042 (73K) [text/plain]\n",
      "正在保存至: “data/paul_graham/paul_graham_essay.txt”\n",
      "\n",
      "data/paul_graham/pa 100%[===================>]  73.28K  43.2KB/s  用时 1.7s      \n",
      "\n",
      "2024-07-10 14:01:04 (43.2 KB/s) - 已保存 “data/paul_graham/paul_graham_essay.txt” [75042/75042])\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Total documents: 1\n",
      "First document, id: 5ddae8c1-f137-4500-83cd-e38e42d4f72b\n",
      "First document, hash: 8fde8a692925d317c5544f3dbaa88eeb5e9ec0cbdb74da1de19d57ee75ac0c3c\n",
      "First document, text (75014 characters):\n",
      "====================\n",
      "\n",
      "\n",
      "What I Worked On\n",
      "\n",
      "February 2021\n",
      "\n",
      "Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined ma ...\n"
     ]
    }
   ],
   "source": [
    "# load documents\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham/\").load_data()\n",
    "print(f\"Total documents: {len(documents)}\")\n",
    "print(f\"First document, id: {documents[0].doc_id}\")\n",
    "print(f\"First document, hash: {documents[0].hash}\")\n",
    "print(\n",
    "    \"First document, text\"\n",
    "    f\" ({len(documents[0].text)} characters):\\n{'='*20}\\n{documents[0].text[:360]} ...\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the Lindorm Vector Store object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# only for jupyter notebook\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "# lindorm instance info\n",
    "host = \"ld-bp******jm*******-proxy-search-pub.lindorm.aliyuncs.com\"\n",
    "port = 30070\n",
    "username = \"your_username\"\n",
    "password = \"your_password\"\n",
    "\n",
    "\n",
    "# index demonstrate the VectorStore impl\n",
    "index_name = \"lindorm_rag_test\"\n",
    "\n",
    "# extenion param of lindorm search, number of cluster units to query; between 1 and method.parameters.nlist(ivfpq param); no default value.\n",
    "nprobe = \"2\"\n",
    "\n",
    "# extenion param of lindorm search, usually used to improve recall accuracy, but it increases performance overhead; between 1 and 200; default: 10.\n",
    "reorder_factor = \"10\"\n",
    "\n",
    "#  LindormVectorClient encapsulates logic for a single index with vector search enabled\n",
    "client = LindormVectorClient(\n",
    "    host,\n",
    "    port,\n",
    "    username,\n",
    "    password,\n",
    "    index=index_name,\n",
    "    dimension=1536,  # match dimension of your embedding model\n",
    "    nprobe=nprobe,\n",
    "    reorder_factor=reorder_factor,\n",
    "    # filter_type=\"pre_filter/post_filter(default)\"\n",
    ")\n",
    "\n",
    "# initialize vector store\n",
    "vector_store = LindormVectorStore(client)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build the Index from the Documents:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/guoguo/Library/Python/3.9/lib/python/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 25.27it/s]\n",
      "Generating embeddings: 100%|██████████| 22/22 [00:02<00:00, 10.31it/s]\n"
     ]
    }
   ],
   "source": [
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "\n",
    "# initialize an index using our sample data and the client we just created\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents=documents, storage_context=storage_context, show_progress=True\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Querying the store:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Search Test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---------------------------------------------\n",
      "Score: 0.448\n",
      "What I Worked On\n",
      "\n",
      "February 2021\n",
      "\n",
      "Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n",
      "\n",
      "The first programs I tried writing were on the IBM 1401 that our school district used for what was then called \"data processing.\" This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n",
      "\n",
      "The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.\n",
      "\n",
      "I was puzzled by the 1401. I couldn't figure out what to do with it. And in retrospect there's not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn't have any data stored on punched cards. The only other option was to do things that didn't rely on any input, like calculate approximations of pi, but I didn't know enough math to do anything interesting of that type. So I'm not surprised I can't remember any programs I wrote, because they can't have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn't. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager's expression made clear.\n",
      "\n",
      "With microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n",
      "\n",
      "The first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n",
      "\n",
      "Computers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he'd write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n",
      "\n",
      "Though I liked programming, I didn't plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn't much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n",
      "\n",
      "I couldn't have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.\n",
      "\n",
      "AI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven't tried rereading The Moon is a Harsh Mistress, so I don't know how well it has aged, but when I read it I was drawn entirely into its world.\n",
      "---------------------------------------------\n",
      "\n",
      "\n",
      "---------------------------------------------\n",
      "Score: 0.434\n",
      "It was not, in fact, simply a matter of teaching SHRDLU more words. That whole way of doing AI, with explicit data structures representing concepts, was not going to work. Its brokenness did, as so often happens, generate a lot of opportunities to write papers about various band-aids that could be applied to it, but it was never going to get us Mike.\n",
      "\n",
      "So I looked around to see what I could salvage from the wreckage of my plans, and there was Lisp. I knew from experience that Lisp was interesting for its own sake and not just for its association with AI, even though that was the main reason people cared about it at the time. So I decided to focus on Lisp. In fact, I decided to write a book about Lisp hacking. It's scary to think how little I knew about Lisp hacking when I started writing that book. But there's nothing like writing a book about something to help you learn it. The book, On Lisp, wasn't published till 1993, but I wrote much of it in grad school.\n",
      "\n",
      "Computer Science is an uneasy alliance between two halves, theory and systems. The theory people prove things, and the systems people build things. I wanted to build things. I had plenty of respect for theory — indeed, a sneaking suspicion that it was the more admirable of the two halves — but building things seemed so much more exciting.\n",
      "\n",
      "The problem with systems work, though, was that it didn't last. Any program you wrote today, no matter how good, would be obsolete in a couple decades at best. People might mention your software in footnotes, but no one would actually use it. And indeed, it would seem very feeble work. Only people with a sense of the history of the field would even realize that, in its time, it had been good.\n",
      "\n",
      "There were some surplus Xerox Dandelions floating around the computer lab at one point. Anyone who wanted one to play around with could have one. I was briefly tempted, but they were so slow by present standards; what was the point? No one else wanted one either, so off they went. That was what happened to systems work.\n",
      "\n",
      "I wanted not just to build things, but to build things that would last.\n",
      "\n",
      "In this dissatisfied state I went in 1988 to visit Rich Draves at CMU, where he was in grad school. One day I went to visit the Carnegie Institute, where I'd spent a lot of time as a kid. While looking at a painting there I realized something that might seem obvious, but was a big surprise to me. There, right on the wall, was something you could make that would last. Paintings didn't become obsolete. Some of the best ones were hundreds of years old.\n",
      "\n",
      "And moreover this was something you could make a living doing. Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive. And as an artist you could be truly independent. You wouldn't have a boss, or even need to get research funding.\n",
      "\n",
      "I had always liked looking at paintings. Could I make them? I had no idea. I'd never imagined it was even possible. I knew intellectually that people made art — that it didn't just appear spontaneously — but it was as if the people who made it were a different species. They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine. The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.\n",
      "\n",
      "That fall I started taking art classes at Harvard. Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going. If he even knew about the strange classes I was taking, he never said anything.\n",
      "\n",
      "So now I was in a PhD program in computer science, yet planning to be an artist, yet also genuinely in love with Lisp hacking and working away at On Lisp. In other words, like many a grad student, I was working energetically on multiple projects that were not my thesis.\n",
      "\n",
      "I didn't see a way out of this situation. I didn't want to drop out of grad school, but how else was I going to get out? I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he'd found such a spectacular way to get out of grad school.\n",
      "\n",
      "Then one day in April 1990 a crack appeared in the wall.\n",
      "---------------------------------------------\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# Set Retriever\n",
    "vector_retriever = index.as_retriever()\n",
    "# search\n",
    "source_nodes = vector_retriever.retrieve(\"What did the author do growing up?\")\n",
    "# check source_nodes\n",
    "for node in source_nodes:\n",
    "    # print(node.metadata)\n",
    "    print(f\"---------------------------------------------\")\n",
    "    print(f\"Score: {node.score:.3f}\")\n",
    "    print(node.get_content())\n",
    "    print(f\"---------------------------------------------\\n\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Basic Querying"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Growing up, the author worked on two main activities outside of school: writing and programming. They wrote short stories instead of essays, and their early programming attempts involved using an IBM 1401 computer with Fortran, despite the challenges posed by the limited input methods and their lack of sophisticated mathematical knowledge.'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# run query\n",
    "query_engine = index.as_query_engine(llm=dashscope_llm)\n",
    "# query_engine = index.as_query_engine()\n",
    "res = query_engine.query(\"What did the author do growing up?\")\n",
    "res.response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Metadata Filtering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lindorm Vector Store now supports metadata filtering in the form of exact-match `key=value` pairs and range fliter in the form of `>`、`<`、`>=`、`<=` at query time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import Document\n",
    "from llama_index.core.vector_stores import (\n",
    "    MetadataFilters,\n",
    "    MetadataFilter,\n",
    "    FilterOperator,\n",
    "    FilterCondition,\n",
    ")\n",
    "import regex as re"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split the text into paragraphs.\n",
    "text_chunks = documents[0].text.split(\"\\n\\n\")\n",
    "\n",
    "# Create a document for each footnote\n",
    "footnotes = [\n",
    "    Document(\n",
    "        text=chunk,\n",
    "        id=documents[0].doc_id,\n",
    "        metadata={\n",
    "            \"is_footnote\": bool(re.search(r\"^\\s*\\[\\d+\\]\\s*\", chunk)),\n",
    "            \"mark_id\": i,\n",
    "        },\n",
    "    )\n",
    "    for i, chunk in enumerate(text_chunks)\n",
    "    if bool(re.search(r\"^\\s*\\[\\d+\\]\\s*\", chunk))\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1140.07it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 506.62it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 957.82it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1170.94it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1043.88it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1337.47it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1055.97it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1331.10it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1408.43it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1081.84it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 479.68it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 946.15it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1062.66it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 363.93it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 760.53it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 1027.76it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 470.37it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 513.13it/s]\n",
      "Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 460.81it/s]\n"
     ]
    }
   ],
   "source": [
    "# Insert the footnotes into the index\n",
    "for f in footnotes:\n",
    "    index.insert(f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[NodeWithScore(node=TextNode(id_='c307ea8e-3647-43f0-9858-34581cc50ce5', embedding=None, metadata={'ref_doc_id': 'd9f7000e-412c-466d-9792-88b15aad7148', 'mark_id': 173, 'is_footnote': True, 'document_id': 'd9f7000e-412c-466d-9792-88b15aad7148', '_node_type': 'TextNode', 'doc_id': 'd9f7000e-412c-466d-9792-88b15aad7148', 'content': \"[19] One way to get more precise about the concept of invented vs discovered is to talk about space aliens. Any sufficiently advanced alien civilization would certainly know about the Pythagorean theorem, for example. I believe, though with less certainty, that they would also know about the Lisp in McCarthy's 1960 paper.\", '_node_content': '{\"id_\": \"c307ea8e-3647-43f0-9858-34581cc50ce5\", \"embedding\": null, \"metadata\": {\"is_footnote\": true, \"mark_id\": 173}, \"excluded_embed_metadata_keys\": [], \"excluded_llm_metadata_keys\": [], \"relationships\": {\"1\": {\"node_id\": \"d9f7000e-412c-466d-9792-88b15aad7148\", \"node_type\": \"4\", \"metadata\": {\"is_footnote\": true, \"mark_id\": 173}, \"hash\": \"b43f450088029936fd7a03f5917ff9c487ba2e3ed9c6c22de43e024a67f8f48e\", \"class_name\": \"RelatedNodeInfo\"}}, \"text\": \"\", \"mimetype\": \"text/plain\", \"start_char_idx\": 0, \"end_char_idx\": 323, \"text_template\": \"{metadata_str}\\\\n\\\\n{content}\", \"metadata_template\": \"{key}: {value}\", \"metadata_seperator\": \"\\\\n\", \"class_name\": \"TextNode\"}'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text=\"[19] One way to get more precise about the concept of invented vs discovered is to talk about space aliens. Any sufficiently advanced alien civilization would certainly know about the Pythagorean theorem, for example. I believe, though with less certainty, that they would also know about the Lisp in McCarthy's 1960 paper.\", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.5169236)]\n"
     ]
    }
   ],
   "source": [
    "retriever = index.as_retriever(\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            MetadataFilter(\n",
    "                key=\"is_footnote\", value=\"true\", operator=FilterOperator.EQ\n",
    "            ),\n",
    "            MetadataFilter(\n",
    "                key=\"mark_id\", value=0, operator=FilterOperator.GTE\n",
    "            ),\n",
    "        ],\n",
    "        condition=FilterCondition.AND,\n",
    "    ),\n",
    ")\n",
    "\n",
    "result = retriever.retrieve(\"What did the author about space aliens and lisp?\")\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"The author suggests that any sufficiently advanced alien civilization would be aware of the Pythagorean theorem and, albeit with less certainty, they would also be familiar with Lisp as described in McCarthy's 1960 paper.\""
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a query engine that only searches certain footnotes.\n",
    "footnote_query_engine = index.as_query_engine(\n",
    "    filters=MetadataFilters(\n",
    "        filters=[\n",
    "            MetadataFilter(\n",
    "                key=\"is_footnote\", value=\"true\", operator=FilterOperator.EQ\n",
    "            ),\n",
    "            MetadataFilter(\n",
    "                key=\"mark_id\", value=0, operator=FilterOperator.GTE\n",
    "            ),\n",
    "        ],\n",
    "        condition=FilterCondition.AND,\n",
    "    ),\n",
    "    llm=dashscope_llm,\n",
    ")\n",
    "\n",
    "res = footnote_query_engine.query(\n",
    "    \"What did the author about space aliens and lisp?\"\n",
    ")\n",
    "res.response"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Hybrid Search"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Lindorm search support hybrid search, note the minimum search granularity of query str is one token."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[NodeWithScore(node=TextNode(id_='c307ea8e-3647-43f0-9858-34581cc50ce5', embedding=None, metadata={'ref_doc_id': 'd9f7000e-412c-466d-9792-88b15aad7148', 'mark_id': 173, 'is_footnote': True, 'document_id': 'd9f7000e-412c-466d-9792-88b15aad7148', '_node_type': 'TextNode', 'doc_id': 'd9f7000e-412c-466d-9792-88b15aad7148', 'content': \"[19] One way to get more precise about the concept of invented vs discovered is to talk about space aliens. Any sufficiently advanced alien civilization would certainly know about the Pythagorean theorem, for example. I believe, though with less certainty, that they would also know about the Lisp in McCarthy's 1960 paper.\", '_node_content': '{\"id_\": \"c307ea8e-3647-43f0-9858-34581cc50ce5\", \"embedding\": null, \"metadata\": {\"is_footnote\": true, \"mark_id\": 173}, \"excluded_embed_metadata_keys\": [], \"excluded_llm_metadata_keys\": [], \"relationships\": {\"1\": {\"node_id\": \"d9f7000e-412c-466d-9792-88b15aad7148\", \"node_type\": \"4\", \"metadata\": {\"is_footnote\": true, \"mark_id\": 173}, \"hash\": \"b43f450088029936fd7a03f5917ff9c487ba2e3ed9c6c22de43e024a67f8f48e\", \"class_name\": \"RelatedNodeInfo\"}}, \"text\": \"\", \"mimetype\": \"text/plain\", \"start_char_idx\": 0, \"end_char_idx\": 323, \"text_template\": \"{metadata_str}\\\\n\\\\n{content}\", \"metadata_template\": \"{key}: {value}\", \"metadata_seperator\": \"\\\\n\", \"class_name\": \"TextNode\"}'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text=\"[19] One way to get more precise about the concept of invented vs discovered is to talk about space aliens. Any sufficiently advanced alien civilization would certainly know about the Pythagorean theorem, for example. I believe, though with less certainty, that they would also know about the Lisp in McCarthy's 1960 paper.\", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.5169236), NodeWithScore(node=TextNode(id_='4252133e-d45a-49ff-9206-30a946716e83', embedding=None, metadata={'file_path': '/Users/guoguo/code/llama_git/fork/llama_index/docs/examples/vector_stores/data/paul_graham/paul_graham_essay.txt', 'ref_doc_id': '5ddae8c1-f137-4500-83cd-e38e42d4f72b', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'last_modified_date': '2024-07-10', 'creation_date': '2024-07-10', 'document_id': '5ddae8c1-f137-4500-83cd-e38e42d4f72b', '_node_type': 'TextNode', 'doc_id': '5ddae8c1-f137-4500-83cd-e38e42d4f72b', 'content': \"It was not, in fact, simply a matter of teaching SHRDLU more words. That whole way of doing AI, with explicit data structures representing concepts, was not going to work. Its brokenness did, as so often happens, generate a lot of opportunities to write papers about various band-aids that could be applied to it, but it was never going to get us Mike.\\n\\nSo I looked around to see what I could salvage from the wreckage of my plans, and there was Lisp. I knew from experience that Lisp was interesting for its own sake and not just for its association with AI, even though that was the main reason people cared about it at the time. So I decided to focus on Lisp. In fact, I decided to write a book about Lisp hacking. It's scary to think how little I knew about Lisp hacking when I started writing that book. But there's nothing like writing a book about something to help you learn it. The book, On Lisp, wasn't published till 1993, but I wrote much of it in grad school.\\n\\nComputer Science is an uneasy alliance between two halves, theory and systems. The theory people prove things, and the systems people build things. I wanted to build things. I had plenty of respect for theory — indeed, a sneaking suspicion that it was the more admirable of the two halves — but building things seemed so much more exciting.\\n\\nThe problem with systems work, though, was that it didn't last. Any program you wrote today, no matter how good, would be obsolete in a couple decades at best. People might mention your software in footnotes, but no one would actually use it. And indeed, it would seem very feeble work. Only people with a sense of the history of the field would even realize that, in its time, it had been good.\\n\\nThere were some surplus Xerox Dandelions floating around the computer lab at one point. Anyone who wanted one to play around with could have one. I was briefly tempted, but they were so slow by present standards; what was the point? No one else wanted one either, so off they went. That was what happened to systems work.\\n\\nI wanted not just to build things, but to build things that would last.\\n\\nIn this dissatisfied state I went in 1988 to visit Rich Draves at CMU, where he was in grad school. One day I went to visit the Carnegie Institute, where I'd spent a lot of time as a kid. While looking at a painting there I realized something that might seem obvious, but was a big surprise to me. There, right on the wall, was something you could make that would last. Paintings didn't become obsolete. Some of the best ones were hundreds of years old.\\n\\nAnd moreover this was something you could make a living doing. Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive. And as an artist you could be truly independent. You wouldn't have a boss, or even need to get research funding.\\n\\nI had always liked looking at paintings. Could I make them? I had no idea. I'd never imagined it was even possible. I knew intellectually that people made art — that it didn't just appear spontaneously — but it was as if the people who made it were a different species. They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine. The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.\\n\\nThat fall I started taking art classes at Harvard. Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going. If he even knew about the strange classes I was taking, he never said anything.\\n\\nSo now I was in a PhD program in computer science, yet planning to be an artist, yet also genuinely in love with Lisp hacking and working away at On Lisp. In other words, like many a grad student, I was working energetically on multiple projects that were not my thesis.\\n\\nI didn't see a way out of this situation. I didn't want to drop out of grad school, but how else was I going to get out? I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he'd found such a spectacular way to get out of grad school.\\n\\nThen one day in April 1990 a crack appeared in the wall.\", 'file_size': 75042, '_node_content': '{\"id_\": \"4252133e-d45a-49ff-9206-30a946716e83\", \"embedding\": null, \"metadata\": {\"file_path\": \"/Users/guoguo/code/llama_git/fork/llama_index/docs/examples/vector_stores/data/paul_graham/paul_graham_essay.txt\", \"file_name\": \"paul_graham_essay.txt\", \"file_type\": \"text/plain\", \"file_size\": 75042, \"creation_date\": \"2024-07-10\", \"last_modified_date\": \"2024-07-10\"}, \"excluded_embed_metadata_keys\": [\"file_name\", \"file_type\", \"file_size\", \"creation_date\", \"last_modified_date\", \"last_accessed_date\"], \"excluded_llm_metadata_keys\": [\"file_name\", \"file_type\", \"file_size\", \"creation_date\", \"last_modified_date\", \"last_accessed_date\"], \"relationships\": {\"1\": {\"node_id\": \"5ddae8c1-f137-4500-83cd-e38e42d4f72b\", \"node_type\": \"4\", \"metadata\": {\"file_path\": \"/Users/guoguo/code/llama_git/fork/llama_index/docs/examples/vector_stores/data/paul_graham/paul_graham_essay.txt\", \"file_name\": \"paul_graham_essay.txt\", \"file_type\": \"text/plain\", \"file_size\": 75042, \"creation_date\": \"2024-07-10\", \"last_modified_date\": \"2024-07-10\"}, \"hash\": \"8fde8a692925d317c5544f3dbaa88eeb5e9ec0cbdb74da1de19d57ee75ac0c3c\", \"class_name\": \"RelatedNodeInfo\"}, \"2\": {\"node_id\": \"2ced3a1a-145e-4385-9dc0-1db6fc1c6380\", \"node_type\": \"1\", \"metadata\": {\"file_path\": \"/Users/guoguo/code/llama_git/fork/llama_index/docs/examples/vector_stores/data/paul_graham/paul_graham_essay.txt\", \"file_name\": \"paul_graham_essay.txt\", \"file_type\": \"text/plain\", \"file_size\": 75042, \"creation_date\": \"2024-07-10\", \"last_modified_date\": \"2024-07-10\"}, \"hash\": \"9ea48d733470e38167724bb328f410ee310a1823df667fe2fa78d9ac5aee1e9b\", \"class_name\": \"RelatedNodeInfo\"}, \"3\": {\"node_id\": \"b9b09870-58e1-4501-acd3-6254d97d0864\", \"node_type\": \"1\", \"metadata\": {}, \"hash\": \"6ceb89cb4f34e38e01c21ddc09378cd5c8cd8b201edf8fccb6ac8cc82b471676\", \"class_name\": \"RelatedNodeInfo\"}}, \"text\": \"\", \"mimetype\": \"text/plain\", \"start_char_idx\": 6812, \"end_char_idx\": 11085, \"text_template\": \"{metadata_str}\\\\n\\\\n{content}\", \"metadata_template\": \"{key}: {value}\", \"metadata_seperator\": \"\\\\n\", \"class_name\": \"TextNode\"}'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text=\"It was not, in fact, simply a matter of teaching SHRDLU more words. That whole way of doing AI, with explicit data structures representing concepts, was not going to work. Its brokenness did, as so often happens, generate a lot of opportunities to write papers about various band-aids that could be applied to it, but it was never going to get us Mike.\\n\\nSo I looked around to see what I could salvage from the wreckage of my plans, and there was Lisp. I knew from experience that Lisp was interesting for its own sake and not just for its association with AI, even though that was the main reason people cared about it at the time. So I decided to focus on Lisp. In fact, I decided to write a book about Lisp hacking. It's scary to think how little I knew about Lisp hacking when I started writing that book. But there's nothing like writing a book about something to help you learn it. The book, On Lisp, wasn't published till 1993, but I wrote much of it in grad school.\\n\\nComputer Science is an uneasy alliance between two halves, theory and systems. The theory people prove things, and the systems people build things. I wanted to build things. I had plenty of respect for theory — indeed, a sneaking suspicion that it was the more admirable of the two halves — but building things seemed so much more exciting.\\n\\nThe problem with systems work, though, was that it didn't last. Any program you wrote today, no matter how good, would be obsolete in a couple decades at best. People might mention your software in footnotes, but no one would actually use it. And indeed, it would seem very feeble work. Only people with a sense of the history of the field would even realize that, in its time, it had been good.\\n\\nThere were some surplus Xerox Dandelions floating around the computer lab at one point. Anyone who wanted one to play around with could have one. I was briefly tempted, but they were so slow by present standards; what was the point? No one else wanted one either, so off they went. That was what happened to systems work.\\n\\nI wanted not just to build things, but to build things that would last.\\n\\nIn this dissatisfied state I went in 1988 to visit Rich Draves at CMU, where he was in grad school. One day I went to visit the Carnegie Institute, where I'd spent a lot of time as a kid. While looking at a painting there I realized something that might seem obvious, but was a big surprise to me. There, right on the wall, was something you could make that would last. Paintings didn't become obsolete. Some of the best ones were hundreds of years old.\\n\\nAnd moreover this was something you could make a living doing. Not as easily as you could by writing software, of course, but I thought if you were really industrious and lived really cheaply, it had to be possible to make enough to survive. And as an artist you could be truly independent. You wouldn't have a boss, or even need to get research funding.\\n\\nI had always liked looking at paintings. Could I make them? I had no idea. I'd never imagined it was even possible. I knew intellectually that people made art — that it didn't just appear spontaneously — but it was as if the people who made it were a different species. They either lived long ago or were mysterious geniuses doing strange things in profiles in Life magazine. The idea of actually being able to make art, to put that verb before that noun, seemed almost miraculous.\\n\\nThat fall I started taking art classes at Harvard. Grad students could take classes in any department, and my advisor, Tom Cheatham, was very easy going. If he even knew about the strange classes I was taking, he never said anything.\\n\\nSo now I was in a PhD program in computer science, yet planning to be an artist, yet also genuinely in love with Lisp hacking and working away at On Lisp. In other words, like many a grad student, I was working energetically on multiple projects that were not my thesis.\\n\\nI didn't see a way out of this situation. I didn't want to drop out of grad school, but how else was I going to get out? I remember when my friend Robert Morris got kicked out of Cornell for writing the internet worm of 1988, I was envious that he'd found such a spectacular way to get out of grad school.\\n\\nThen one day in April 1990 a crack appeared in the wall.\", mimetype='text/plain', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.5134669)]\n"
     ]
    }
   ],
   "source": [
    "from llama_index.core.vector_stores.types import VectorStoreQueryMode\n",
    "\n",
    "retriever = index.as_retriever(\n",
    "    vector_store_query_mode=VectorStoreQueryMode.HYBRID\n",
    ")\n",
    "\n",
    "result = retriever.retrieve(\"What did the author about space aliens and lisp?\")\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"The author believes that any sufficiently advanced alien civilization would know about fundamental mathematical concepts like the Pythagorean theorem. They also express, with less certainty, the idea that these aliens would be familiar with Lisp, a programming language discussed in McCarthy's 1960 paper. This thought experiment serves as a way to explore the distinction between ideas that are invented versus discovered.\""
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query_engine = index.as_query_engine(\n",
    "    llm=dashscope_llm, vector_store_query_mode=VectorStoreQueryMode.HYBRID\n",
    ")\n",
    "res = query_engine.query(\"What did the author about space aliens and lisp?\")\n",
    "res.response"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
